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1.  Introduction 

In  a  distributed  database  system  a  transaction  may  be  processed  concurrently 
at  several  different  processors.  To  maintain  the  integrity  of  the  database  these  pro¬ 
cessors  must  take  consistent  action  regarding  the  transaction  —  either  the  results 
of  the  transaction  are  installed  in  the  database  at  all  processors  (the  transaction  is 
committed ),  or  the  results  are  installed  at  no  processor  (the  transaction  is  aborted). 
The  decision  whether  to  abort  or  commit  a  transaction  is  made  by  a  transaction 
commit  protocol.  The  objective  for  such  a  protocol  is  to  commit  as  many  transac¬ 
tions  as  possible  subject  to  the  constraint  that  each  processor  must  be  able  to  abort 
a  transaction  unilaterally. 

A  transaction  commit  protocol  must  never  produce  inconsistent  decisions,  and 
it  must  allow  unilateral  aborts.  It  has  some  leeway,  though.  Some  protocols  can 
produce  more  aborts  than  others,  and  some  protocols  fail  to  terminate  in  some 
situations.  If  failures  can  cause  some  nonfaulty  processors  to  remain  undecided 
about  the  fate  of  a  transaction  (at  least  as  long  as  the  failure  persists),  a  processor 
is  said  to  block,  and  the  protocol  is  called  blocking.  Otherwise,  the  protocol  is 
nonblocking.  The  most  common  transaction  commit  protocol  in  practice,  two  phase 
commit,  is  a  blocking  protocol.  A  blocking  protocol  is  preferable  in  real  systems  to 
one  that  allows  inconsistent  decisions  to  be  made,  since  it  allows  consistent  decisions 
to  be  reached  after  the  failures  are  repaired.  A  nonblocking  protocol  would  be  more 
preferable  still. 

Many  elegant  nonblocking  transaction  commit  protocols  [S]  [DS]  have  been 
developed  for  completely  synchronous  systems.  An  obstacle  to  using  these  proto¬ 
cols  in  real  systems  is  that  a  single  violation  of  the  timing  assumptions  (i.e.,  a  late 
message)  can  cause  the  protocol  to  produce  the  wrong  answer.  The  most  common 
alternative  timing  model,  the  completely  asynchronous  model,  unfortunately  does 
not  allow  any  solution  to  the  transaction  commit  problem,  either  randomized  or 
deterministic.1  We  give  a  new  timing  model  that  is  intermediate  between  the  syn¬ 
chronous  and  asynchronous  models  previously  studied.  In  this  model,  we  give  a 
new  nonblocking  transaction  commit  protocol. 


JThe  intuition  behind  this  impossibility  result  is  the  following.  Suppose  there  is  a 
protocol  that  works  in  an  asynchronous  system,  and  guarantees  that  nonfaulty  processors 
eventually  decide  (with  probability  1);  if  the  processors  all  begin  with  commit  and  there 
are  no  failures,  then  they  all  decide  commit;  and  if  any  processor  begins  with  abort,  then 
the  nonfaulty  processors  decide  abort.  Consider  a  run  in  which  all  processors  but  p  begin 
with  commit  and  are  nonfaulty,  while  p  fails  initially.  Eventually,  the  rest  of  the  processors 
must  decide.  Since  p  could  have  started  with  abort,  the  processors  must  decide  abort.  But 
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We  model  real  systems  in  which  messages  axe  usually  delivered  within  some 
known  time  bound  but  sometimes  come  late.  We  do  this  by  assuming  a  completely 
asynchronous  system,  in  which  relative  processor  speeds  axe  unbounded  and  mes¬ 
sages  can  take  arbitrarily  long  to  arrive,  and  letting  the  timing  behavior  affect  the 
correctness  conditions  for  the  transaction  commit  problem,  as  follows.  If  every  pro¬ 
cessor  initially  wants  to  commit  the  transaction,  then  the  common  decision  must  be 
to  commit,  provided  no  processors  fail  and  all  messages  arrive  within  some  known 
fixed  time  bound.  If  any  processor  initially  wants  to  abort  the  transaction,  then  the 
common  decision  must  be  to  abort,  no  matter  what  the  timing  and  fault  behavior 
of  the  system  is.  This  problem  definition  takes  advantage  of  the  leeway  allowed  in 
specifying  when  processors  must  commit.  Assuming  that  failures  and  late  messages 
are  relatively  rare,  the  overall  progress  of  the  transaction  processing  system  will  not 
be  impeded  very  much.  A  similar  division  is  made  in  [DLS],  in  which  properties 
that  must  always  hold  axe  separated  from  properties  that  only  need  hold  when  the 
system  is  well-behaved.  In  most  other  respects  our  model  differs  from  theirs. 


We  prove  that  in  our  model  no  transaction  commit  protocol  can  terminate  in 
a  bounded  expected  number  of  steps.  Consequently  a  new  measure  is  needed  to 
analyze  the  time  performance  of  our  protocol.  One  of  the  contributions  of  this 
paper  is  such  a  measure,  which  we  call  an  asynchronous  round.  Our  definition  of 
asynchronous  round  is  strong  enough  to  allow  us  to  show  that  our  protocol  termi¬ 
nates  in  a  small  constant  expected  number  of  asynchronous  rounds.  In  Section  2 
we  argue  that  this  notion  of  asynchronous  round  is  not  unrealistically  strong. 


Randomization  is  needed  in  the  protocol  because  a  result  of  [DDS]  implies  that 
no  deterministic  protocol  is  possible.  In  order  to  analyze  a  randomized  protocol, 
we  must  define  the  adversaries  against  which  the  protocol  will  work.  Our  notion  of 
the  adversary  is  drawn  from  [CMS].  The  adversary  in  our  model  chooses  the  order 
in  which  processors  take  steps,  when  each  message  will  be  delivered,  and  which 
processors  fail  and  when  (as  long  as  fewer  than  half  fail).  It  makes  these  decisions 
dynamically,  during  the  execution  of  the  protocol,  using  unlimited  computational 
power.  The  adversary  has  available  at  any  point  in  the  execution  all  information 
about  the  hardware  and  software  of  the  processors,  and  the  pattern  of  communica¬ 
tion  up  to  that  time,  but  it  does  not  know  the  contents  of  the  messages  sent,  nor 
the  local  states  of  processors,  nor  the  processors’  local  random  choices,  unless  that  ,oeo 

-fit  1  ou 

there  is  another  run  that  looks  identical  up  to  the  decision  point  to  all  the  processors  except 
p.  in  which  p  begins  with  commit,  and  all  its  messages  are  delayed  until  after  the  decision  ,tloo/ 


on  For 


is  made.  But  in  this  run,  the  decision  should  have  been  commit. 
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information  is  deducible  from  the  pattern  of  communication.  We  will  be  careful  to 
design  our  protocol  so  that  it  is  not  deducible. 

Our  protocol  uses  a  modified  version  of  a  solution  to  the  agreement  problem. 
In  the  agreement  problem  each  processor  begins  with  an  initial  value,  0  or  1,  and 
decides  on  a  final  value.  All  nonfaulty  processors’  final  values  must  be  equal,  and  if 
all  processors  have  the  same  initial  value,  then  that  value  must  be  the  final  value. 
Thus  if  one  processor  begins  with  0  and  the  rest  with  1,  either  0  or  1  is  a  correct 
answer  to  the  agreement  problem,  whereas  in  the  transaction  commit  problem,  the 
answer  must  be  0  (if  0  is  identified  with  abort). 

An  important  difference  between  the  transaction  commit  problem  and  the 
agreement  problem  is  that  in  the  former,  all  processors  that  decide  are  required 
to  agree,  including  processors  that  decide  and  subsequently  fail.  This  strict  agree¬ 
ment  condition  is  imposed  because  we  assume  that  failed  processors  will  eventually 
recover.  The  hope  is  that  processors  that  fail  and  subsequently  recover  can  be 
reintegrated  using  a  separate  recovery  protocol.  Skeen’s  thesis  has  an  excellent  dis¬ 
cussion  of  recovery  protocols  [S].  We  do  not  discuss  these  protocols  further  in  this 
paper. 

We  assume  that  the  faulty  processors  fail  by  crashing  (i.e.,  stopping  without 
warning).  This  is  a  realistic  assumption  that  is  commonly  made  in  the  database 
literature  [S].  The  number  of  faults  tolerated  by  our  protocol  is  optimal,  since  we 
prove  a  matching  lower  bound.  Our  protocol  works  as  long  as  more  than  half  the 
processors  are  nonfaulty.  An  important  property  of  our  protocol  is  that  it  degrades 
gracefully  if  the  bound  on  the  number  of  faulty  processors  is  exceeded  —  instead  of 
producing  a  wrong  answer,  the  protocol  simply  fails  to  terminate. 

At  the  beginning  of  our  protocol,  processors  exchange  some  messages,  and  then 
execute  a  modification  of  Ben-Or’s  asynchronous  agreement  protocol  [Be]  to  decide 
the  fate  of  the  transaction.  The  preliminary'  message  exchanges  serve  two  purposes: 
first,  the  differences  between  the  input-output  relations  for  the  transaction  commit 
and  agreement  problems  are  resolved,  and  second,  a  number  of  identical  random  bits 
are  distributed.1  These  identical  random  bits  are  used  in  the  agreement  protocol 
to  lower  the  expected  running  time  from  exponential  to  constant.  There  is  a  body 
of  work  dealing  with  attaining  constant  expected  running  time  for  the  agreement 

JWe  have  not  solved  the  global  coin  toss  problem,  however,  because  our  protocol  does 
not  guarantee  that  the  identical  random  bits  are  successfully  distributed;  the  nature  of  the 
transaction  commit  problem,  as  discussed  above,  is  such  that  our  protocol  can  tolerate 
this  failure. 


5 


problem  [R]  [CMS];  our  technique  does  not  solve  this  problem,  for  the  following 
reason.  In  our  protocol,  if  the  identical  random  bits  are  not  distributed  in  a  timely 
fashion,  processors  can  unilaterally  decide  0  (abort),  because  we  are  solving  the 
transaction  commit  problem.  Such  action  is  not  an  option  for  processors  trying 
to  solve  the  agreement  problem,  because  it  could  violate  the  condition  that  all 
processors  decide  1  if  they  all  start  with  1. 

The  transaction  commit  protocols  of  Skeen  [S]  and  Dwork  and  Skeen  [DS] 
tolerate  any  number  of  processor  faults,  while  our  protocol  only  handles  fewer  than 
half  of  the  processors  failing.  However,  if  half  or  more  of  the  processors  fail,  our 
protocol  does  not  produce  a  wrong  answer  but  merely  fails  to  terminate,  leaving 
open  the  opportunity  for  processors  to  recover.  Late  messages  are  not  a  problem 
for  our  protocol  because  of  our  model,  but  as  we  noted  earlier  they  can  cause  the 
protocols  in  [S]  and  [DS]  to  produce  a  wrong  answer. 

In  summary,  the  principal  contributions  of  this  paper  are  a  realistic  timing 
model,  a  method  for  analyzing  the  time  performance  of  protocols  in  this  model, 
an  efficient  fault- tolerant  protocol  for  the  transaction  commit  problem,  and  lower 
bounds  showing  that  the  protocol  has  optimal  fault- tolerance,  and  that  no  protocol 
can  terminate  in  a  constant  expected  number  of  steps  for  each  processor. 

Following  an  exposition  of  our  formal  model  in  Section  2,  we  present  our  ran¬ 
domized  transaction  commit  protocol  in  Section  3.  Section  4  contains  the  lower 
bound  proof  showing  that  our  protocol  tolerates  the  maximal  number  of  faulty  pro¬ 
cessors.  Finally,  in  Section  5  we  show  that  no  transaction  commit  protocol  can 
guarantee  that  each  processor  terminate  in  a  bounded  expected  number  of  its  own 
steps,  even  if  processors  are  synchronous. 

2.  Model 

Processors  are  modeled  as  state  machines  that  communicate  by  sending  mes¬ 
sages.  Messages  can  take  arbitrarily  long  to  arrive.  Our  protocol  works  even  in  a 
very  weak  model  in  which  there  is  no  bound  on  the  relative  frequency  with  which 
processors  take  steps,  and  in  which  there  is  no  atomic  broadcast  of  messages.  Our 
lower  bound  results  are  shown  for  the  stronger  case  in  which  processors  run  in  lock- 
step  synchrony  and  possess  atomic  broadcast.  In  this  section  we  present  the  weaker 
model.  In  Sections  4  and  5  we  indicate  the  necessary  changes  for  the  stronger  model. 
Our  model  is  similar  to  those  in  [FLP]  and  [DDS]. 

Throughout  this  paper,  1  is  identified  with  “commit”  and  0  with  “abort.” 
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2.1  Basic  Model 

A  raw  message  consists  of  some  text,  and  the  names  of  the  sending  and  receiving 
processors.  A  message  is  a  (raw  message,  integer)  ordered  pair;  the  integer  denotes 
the  sending  time,  as  will  be  explained  later.  The  reason  for  distinguishing  between 
messages  and  raw  messages  is  that  we  do  not  wish  to  require  timestamps  on  all 
(raw)  messages  sent  by  processors,  yet  this  information  is  useful  in  the  exposition 
of  the  model  for  distinguishing  multiple  instances  of  the  same  raw  message  and 
determining  message  delays. 


A  processor  is  an  infinite  state  machine,  together  with  a  message  buffer,  and  a 
random  number  generator.  The  message  buffer  holds  messages  that  have  been  sent 
to  the  processor  but  not  yet  received,  and  is  modeled  as  a  set  of  messages.  The 
random  number  generator  supplies  an  infinite  sequence  of  n-bit  strings.  The  state 
machine’s  transition  function  uses  the  current  state,  current  random  bit  string  and 
set  of  raw  messages  received  to  compute  the  new  state  and  raw  messages  to  be  sent. 
Certain  states  are  initial  states,  designated  (idfinitval),  where  id  is  a  nonnegative 
integer  and  initval  is  either  0  or  1.  The  id  element  of  the  initial  state  is  the 
processor’s  name,  or  identification  number.  The  initval  element  is  the  processor’s 
initial  value.  Each  processor  can  send  zero  or  one  message  to  every  processor  in  one 
step.  There  is  an  integer  in  each  processor’s  state,  called  its  clock ,  which  is  0  in  an 
initial  state,  and  is  always  incremented  by  1  by  the  transition  function.  Thus,  the 
clock  counts  how  many  steps  the  processor  has  taken  so  far.  A  protocol  is  a  set  of 
n  processors. 

A  configuration  C  consists  of  n  states,  one  for  each  processor,  and  n  sets  of 
messages,  one  for  each  processor’s  buffer.  An  initial  configuration  has  all  processors 
in  initial  states  and  all  buffers  equal  to  the  empty  set. 

An  event  is  denoted  (p,  M,  b),  in  which  processor  p  receives  the  set  of  messages 
M  (which  can  be  empty),  and  the  random  bit  string  6. 

An  event  e  =  (p,  M,  6)  is  applicable  to  configuration  C  if  every  message  in  M 
is  an  element  of  p’s  buffer  in  C.  Let  s  and  M'  be  the  state  and  set  of  raw  messages 
resulting  from  applying  p’s  transition  function  to  p’s  state  in  C,  6,  and  the  raw 
messages  extracted  from  M.  The  configuration  resulting  from  applying  e  to  C, 
denoted  e(C),  is  obtained  from  C  by  removing  all  messages  in  M  from  p’s  buffer, 
changing  p’s  state  to  s,  and  adding  the  message  (m,i),  for  each  m  €  M' ,  to  the 
appropriate  buffer,  where  i  is  the  value  of  p’s  clock  in  s. 
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A  schedule  is  a  finite  or  infinite  sequence  of  events.  A  finite  schedule  a  — 
eje2  . . .  ejfe  is  applicable  to  configuration  C  if  e\  is  applicable  to  C,  e2  is  applicable 
to  e\(C),  etc.  The  resulting  configuration  is  denoted  <r(C).  An  infinite  schedule  is 
applicable  to  C  if  every  finite  prefix  of  the  schedule  is  applicable  to  C. 

Given  configuration  C\  and  schedule  a  applicable  to  Cj ,  we  define  the  run  R  = 
run(Ci,a)  obtained  from  C\  and  a ,  as  follows.  If  a  —  eie2  . . .  e*  is  finite,  then  R  is 
the  sequence  C\e\Ciei . . .  ejC/t+i,  where  Ci+i  =  ei(Ci),  1  <  i  <  k.  If  a  =  eie2  . . .  is 
infinite,  then  R  is  the  sequence  CieiC2C2  . . .,  where,  for  all  i,  CieiC2e2  . . .  e^Ct+i  = 
run(Ci,  eie2  •  •  •  ).  We  also  denote  a  by  sched(R).  Informally,  a  run  is  a  schedule 

together  with  its  associated  configurations. 

Processor  p  is  nonfaulty  in  an  infinite  run  or  schedule  if  it  takes  an  infinite 
number  of  steps;  otherwise  it  is  faulty.  An  infinite  run  or  schedule  is  failure-free 
if  no  processor  is  faulty  in  it.  Since  the  interleaving  of  processors’  steps  in  a  run 
or  schedule  may  be  arbitrary,  no  particular  degree  of  synchronization  is  necessarily 
achieved. 

A  message  sent  by  processor  p  at  event  e  in  infinite  run  R  is  guaranteed  if  e  is 
not  the  last  step  of  p  in  R.  An  infinite  run  R  is  t-admissible ,  for  0  <  t  <  n,  if 

•  the  first  configuration  is  an  initial  configuration, 

•  at  most  t  processors  are  faulty,  and 

•  all  guaranteed  messages  sent  to  nonfaulty  processors  are  eventually  received. 
The  notion  of  guaranteed  messages  is  used  to  model  the  lack  of  atomic  broadcast. 
Since  messages  sent  at  a  processor’s  last  step  do  not  have  to  be  received,  we  effec¬ 
tively  model  a  processor  failing  in  the  middle  of  a  broadcast. 

There  are  two  disjoint  sets  of  decision  states,  Yo  and  Yx ,  such  that  if  a  processor 
enters  a  state  in  Yo  or  Yx  it  stays  in  that  set  forever.  A  processor  decides  v  when 
it  is  in  a  state  in  Yv.  A  run  is  deciding  if  every  nonfaulty  processor  decides.  A 
configuration  C  has  decision  value  v  if  there  is  some  processor  whose  state  in  C  is 
an  element  of  Yv. 

2.2  Timing  Constraints 

We  fix  a  positive  constant  I\  >  1,  which  is  used  to  define  late  messages.  A 
message  rn  from  p  to  q  is  late  in  run  R  =  C\e\Citi ...  if  event  e,  adds  m  to  q  s 
message  buffer,  and  one  of  the  following  is  true.  (1)  There  is  no  event  in  R  that 
removes  m  from  q's  message  buffer,  and  some  processor  takes  more  than  K  steps 
in  R  after  e,.  (2)  There  is  an  event  er  that  removes  m  from  q's  message  buffer, 


and  some  processor  takes  more  than  K  steps  in  the  schedule  e4+1  . . .  er.  A  run  is 
on-time  if  it  contains  no  late  messages. 

Ideally  we  would  like  a  processor  to  decide  in  a  constant  expected  number  of 
its  own  steps.  Unfortunately,  as  we  prove  in  Section  5,  we  cannot  do  this,  even  if 
processors  run  in  lockstep  synchrony.  Instead,  we  characterize  the  time  performance 
of  our  protocol  using  the  following  definition.  Given  an  infinite  run,  a  processor  is 
defined  inductively  to  be  in  a  particular  asynchronous  round  (or  round)  as  follows. 
Asynchronous  round  1  begins  for  processor  p  when  p  first  takes  a  step  and  ends 
after  p’s  Kth  step.  Asynchronous  round  r,  r  >  1,  begins  for  p  at  the  end  of  p’s 
round  r  —  1  and  ends  either  K  of  p’s  steps  after  the  end  of  p’s  round  r  —  1,  or  as 
soon  as  p  receives  every  received  message  sent  by  a  processor  q  in  g’s  round  r  —  1, 
whichever  happens  later.  (We  say  “every  received  message”  in  order  to  make  sure 
that  no  round  lasts  infinitely  long  due  to  p’s  waiting  for  a  non-guaranteed  message 
from  q  that  never  arrives.) 

This  definition  uses  two  criteria  for  ending  a  round,  the  number  of  processor 
steps  taken  and  the  collection  of  messages  received.  These  criteria  seem  natural  in 
our  timing  model,  in  which  processors  can  take  actions  depending  on  the  receipt  of 
messages,  as  well  as  on  timeouts. 

A  processor  cannot  compute  its  current  asynchronous  round;  the  definition  is 
for  our  use  as  omniscient  observers  as  we  analyze  protocols.  The  reason  we  require 
a  round  to  last  at  least  K  steps  is  to  prevent  a  round  from  collapsing  to  nothing  if 
no  messages  are  sent  in  the  previous  round.  If  processors  take  steps  in  round-robin 
order,  and  receive  and  send  messages  only  at  the  beginning  of  a  round,  and  if  each 
message  sent  at  the  sender’s  ith  step  is  received  at  the  recipient’s  (i  -f  K)th  step  (for 
all  i),  then  this  definition  is  essentially  the  same  as  the  synchronous  round  definition 
in  [DS].  Thus  this  definition  is  not  unreasonably  strong. 

2.3  Safety  Conditions 

The  following  definition  restricts  what  must  happen  if  a  processor  decides,  but 
does  not  require  any  processor  to  decide.  A  protocol  is  a  transaction  commit  protocol 
if  for  every  t-admissible  run  R: 

•  Agreement  Condition:  Every  configuration  has  at  most  one  decision  value. 

•  Abort  Validity  Condition:  If  the  initial  value  of  any  processor  is  0,  then  no 

configuration  has  decision  value  1. 
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•  Commit  Validity  Condition:  If  the  initial  value  of  all  processors  is  1  and  R  is 

failure-free  and  on-time,  then  no  configuration  has  decision  value  0. 

To  exclude  uninteresting  protocols,  we  require  that  each  processor  must  be  able 
to  receive  at  least  n  messages  at  each  step.  Otherwise,  processors  could  swamp  the 
message  system,  causing  messages  to  become  late  not  because  the  message  system 
misbehaves,  but  because  the  ability  of  the  processors  to  handle  all  the  incoming 
message  traffic  is  inadequate.1  For  instance,  the  protocol  “cause  the  run  to  be  not 
on-time  by  flooding  the  message  system  and  then  abort”  is  not  of  much  practical 
interest. 

2.4  Adversary 

The  adversary  can  be  considered  a  scheduler  —  it  decides  which  processor 
takes  a  step  next  and  what  messages  are  received.  In  the  introduction  we  gave  an 
informal  description  of  the  adversary.  This  subsection  formalizes  the  notion. 

The  message  pattern  of  finite  run  R  =  C\e\  . .  ejtCfc+i,  where  =  (pi,  Mi,  fi) 
for  all  1  <  *  <  k,  is  the  sequence  of  triples  (p\,E\,Pi) . . .  (p*,  Ek,  P*),  where  P,  is 
the  set  of  processors  to  which  messages  were  sent  by  event  e,,  and  Et  is  a  set  of 
integers  indexing  the  events  in  the  run  that  sent  the  messages,  Mi,  received  in  e,. 
The  point  of  making  this  definition  is  to  isolate  the  pattern  of  message  sending  and 
receiving  while  hiding  the  contents  of  the  messages. 

An  adversary  is  a  function  that  takes  a  message  pattern,  and  returns  a  processor 
p  and  a  set  E  of  integers  (which  may  be  empty)  satisfying  the  following  condition. 
If  i  is  in  E,  then  in  the  ith  element  of  the  message  pattern,  (pt,  Ei,  Pi),  p  is  in  P, 
(i.e.,  there  actually  was  a  message  sent  to  p  at  the  ith  event),  and  in  no  element  of 
the  message  pattern  does  p  receive  this  message  (i.e.,  the  message  in  question  has 
not  yet  been  received).  Thus,  the  adversary  decides  on  the  next  processor  to  take 
a  step,  plus  a  collection  of  messages  to  be  received. 

Let  T  be  the  collection  of  all  n-tuples  of  infinite  sequences  of  n-bit  strings.  Each 
element  of  T  is  a  possible  set  of  choices  returned  by  the  n  processors’  random  number 


Suppose  each  processor  can  send  n  messages  per  step  but  only  receive  n  —  1.  Consider 
the  protocol:  At  each  step,  broadcast  a  message;  at  step  1,  decide  0.  We  now  show  that 
no  infinite  run  is  on-time.  Let  R  be  an  infinite  run.  After  Kn{n  —  1)  +  n  events, 
( Kn(n  —  1)  -f  n)n  messages  have  been  sent,  and  at  most  ( Kn(n  —  1)  +  n)(n  —  1) 
have  been  received.  So  there  are  at  least  Kn(n  —  1)  -f  n  outstanding  messages.  By  the 
pigeonhole  principle,  some  processor  p  has  at  least  Kin  —  1)  +  1  outstanding  messages 
(to  be  received).  It  will  take  p  at  least  K  +  1  steps  to  receive  all  those  messages,  by  which 
time  the  run  will  no  longer  be  on-time. 
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generators  in  an  infinite  run.  A  run  is  uniquely  determined  by  an  adversary  .4,  an 
initial  configuration  /,  and  an  element  F  of  T.  Denote  this  run  by  run(A,  /,  F). 
The  construction  of  run(A,I,  F)  =  C\e\Ci£2  -  is  inductive.  Let  C\  —  I.  Suppose 
the  run  up  to  configuration  C,  has  been  constructed.  Let  p  and  E  be  the  result  of  .4 
acting  on  the  message  pattern  of  run  C\e\  . .  C,.  Then  e,  consists  of  the  processor 
p,  the  messages  sent  to  p  in  all  the  events  indexed  by,  E ,  and  the  next  unused  bit 
string  in  the  sequence  for  p  in  F.  Finally,  C,+\  =  e,(C,).  Since  the  adversary  is 
a  total  function,  run(A,  /,  F)  is  an  infinite  run,  and  thus  at  least  one  processor  is 
nonfaulty. 

If  the  adversary  were  not  restricted  in  any  way,  it  could  cause  all  processors 
(but  one)  to  fail  or  no  messages  to  be  delivered,  and  no  protocol  would  be  possible. 
We  limit  the  power  of  the  adversary  in  the  following  reasonable  way.  We  define  a 
t-  admissible  adversary  to  be  an  adversary  such  that  for  all  initial  configurations  / 
and  all  F  in  F,  run(A,I,F)  is  ^-admissible. 

For  predicate  P  defined  on  runs,  let  Pr[P]  be  the  probability  of  the  event 
{F  G  T  :  run(A,  I,  F)  satisfies  P},  for  a  fixed  adversary  A  and  initial  configuration 
/. 

The  expected  value  of  any  complexity  measure  for  a  fixed  randomized  protocol 
is  defined  as  follows.  Let  T  be  a  random  variable  that  given  a  run  returns  the 
complexity  measure  of  interest  for  that  run.  For  fixed  t-admissible  adversary  A  and 
initial  configuration  /,  let  the  expected  value  of  T,  taken  over  the  random  numbers 
P,  be  denoted  E(Taj)-  Define  the  expected  value  for  the  protocol,  ET,  to  be 
maxAi/{P(T4i/)}. 

2.5  Liveness  Condition 

Given  infinite  run  R  and  integer  r,  let  DONE(P, r)  be  the  predicate  that  every 
nonfaulty  processor  decides  by  its  asynchronous  round  r  in  R.  A  protocol  is  t- 
nonblocking  if  for  any  ^-admissible  adversary  .4  and  any  initial  configuration  /, 

lim  Pr[nONE(ruu(.4, /,  F),  7’)]  =  1- 

r— *oo 

2.6  Problem  Statement 

Our  goal  is  to  design  a  t-nonblocking  transaction  commit  protocol. 
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3.  The  Randomized  Commit  Protocol 

Our  protocol  to  solve  the  transaction  commit  problem  is  based  on  the  asyn¬ 
chronous  agreement  protocol  in  [Be],  Similar  protocols  have  been  widely  used  [Br] 
[CC]  [CMS]  [R].  For  the  rest  of  this  section,  we  assume  a  fixed  t  with  n  >  2t. 

3.1  The  Protocol 

In  this  subsection,  we  present  the  randomized  transaction  commit  protocol  by 
describing,  for  each  processor  p,  the  states  and  transition  function  of  p.  First,  we 
give  an  informal  description. 

Throughout  the  protocol  each  processor  keeps  a  vote  telling  what  it  currently 
wants  to  do  with  the  transaction.  The  processor  with  id  0  is  the  coordinator ;  at  its 
first  step,  it  chooses  n  random  bits  and  distributes  them  to  the  other  processors,  the 
participants ,  by  broadcasting  a  coins  message  containing  the  bits.  If  a  participant 
receives  no  message  at  its  first  step,  it  sends  a  request  message  to  the  coordinator 
(to  try  to  jog  it  awake);  if  no  reply  is  received  within  2K  steps,  the  participant  sets 
its  vote  to  0  and  decides  0.  If  a  participant  receives  a  message  at  its  first  step, 
it  extracts  the  n  bits  and  broadcasts  them  in  a  coins  message,  to  indicate  “I  am 
participating  in  the  protocol.”  If  a  processor  does  not  receive  a  coins  message  from 
everyone  within  2K  steps  after  broadcasting  one,  it  sets  its  vote  to  0  and  decides  0. 
Then  each  processor  broadcasts  its  vote.  If  a  processor  does  not  receive  n  votes  for 
1  within  a  short  time,  it  sets  its  vote  to  0,  but  remains  undecided. 

The  rest  of  the  protocol  proceeds  in  stages  (sis  in  [Be]),  numbered  from  1  up 
without  bound.  In  stage  s,  each  processor  p  broadcasts  its  vote  in  a  stage  (s,l) 
message  and  waits  to  receive  n  —  t  stage  (s,  1)  messages.  If  p  receives  more  than 
n/2  stage  (s,  1)  messages  with  vote  v  €  {0, 1},  then  p  broadcasts  v  in  a  stage  (s,2) 
message;  otherwise  p  broadcasts  “?”  in  a  stage  (s,2)  message.  Then  p  waits  to 
receive  n  —  t  stage  (s,2)  messages.  If  p  receives  a  stage  (s,2)  message  with  value 
r  €  {0, 1},  then  p  sets  its  vote  to  v\  otherwise,  p  sets  its  vote  to  a  random  bit,  either 
the  sth  random  bit  from  the  coins  message  if  s  <  n,  or  else  a  locally-determined 
random  bit.  If  p  receives  n  —  t  stage  (s,2)  messages  for  value  v  £  {0. 1},  then  p 
decides  v. 

Processor  p  uses  the  following  constants,  variables  and  subroutines.  Constants 
are  p,  n  and  K.  Variables  are: 

•  clockp:  nonnegative  integer;  initially  0. 


•  stagtp :  values  are  “asleep,”  “request,”  “coins,”  “vote,”  (s,  1)  and  (s,  2)  for  all 

s  >  1;  initially  “asleep.” 

•  timerp:  nonnegative  integer  or  nil]  initially  nil. 

•  coinsp\  n-bit  string  or  nil ;  initially  nil. 

•  votep:  boolean;  initially  p’s  initial  value. 

•  decidep:  boolean  or  nil]  initially  nil. 

The  text  of  each  raw  message  consists  of  the  sending  processor’s  current  stage, 
and  optionally  a  value  (0,  1  or  “?”),  and  an  n-bit  string. 

Below  we  describe  p’s  transition  function,  acting  on  state  q  of  p,  set  M  of 
raw  messages,  and  n-bit  string  b.  The  description  consists  of  several  clusters  of 
pseudocode.  Each  cluster  is  preceded  by  a  predicate  on  q  and  M.  The  predicate 
of  at  most  one  cluster  is  true  for  any  q  and  M.  The  state  of  p  returned  by  the 
transition  function  is  obtained  from  q  by  incrementing  clockp  by  1,  remembering 
the  set  M ,  and  then  executing  the  cluster  (if  any)  whose  predicate  is  true  of  q  and 
M.  The  set  of  raw  messages  returned  by  the  transition  function  is  that  indicated 
by  the  send  and  broadcast  statements  of  the  appropriate  cluster.  If  no  cluster  is 
true,  then  no  raw  messages  are  sent,  the  only  changes  to  the  state  are  that  clockp 
is  incremented  and  the  received  messages  are  remembered. 

/*  coordinator  initiates  protocol  by  distributing  n  random  bits  */ 

stagep  =  “asleep”  for  p  =  coordinator: 
coinsp  :=  b 
stagep  :=  “coins” 
timer p  :=  clockp  +  2 K 
broadcast  (stagep,“?" ,coinsp) 

/*  non-coordinator  wakes  up  and  requests  that  coordinator  initiate  */ 

stagep  =  “asleep”  for  p  ^  coordinator  and  M  —  0: 
stagep  :=  “request” 
timer p  :=  clockp  +  2 K 
send  “request”  to  coordinator 

/*  non-coordinator  receives  coins  */ 

stagep  =  “asleep”  or  “request”  for  p  ^  coordinator  and 

there  is  a  message  in  M  with  text  (s,  u,  coins): 
coinsp  :=  coins 
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siagep  :=  “coins’' 
timerp  :=  clockp  +  2 K 
broadcast  ( stagep,u?'\coinsp ) 

/*  non-coordinator  times  out  while  waiting  to  receive  coins  * / 

stagep  =  “request”  and  clockp  =  timerp : 
votep  :=  0 
decidep  :=  0 

/*  distributing  votes  */ 

stagep  =  “coins”  and  either  clockp  =  timerp  or  n  coins  messages  have  been 
received: 

stagep  :=  “vote” 
timer p  :=  clockp  +  2K 

if  less  than  n  coins  messages  have  been  received  then  [ 
votep  :=  0 
decidep  0  ] 

broadcast  (stagep,vottp,coinsp) 

/*  completing  stage  0  */ 

stagep  =  “vote”  and  either  clockp  =  timerp  or  n  vote  messages  have  been 
received: 

stagep  :=  (1,1) 

if  n  votes  for  1  have  been  received 
then  votep  :=  1 
else  votep  :=  0 

broadcast  [stage p,  vote p,  coins p) 

/*  finishing  first  part  of  stage  s  */ 

stagep  =  (s,  1)  and  at  least  n  —  t  stage  (s,  1)  messages  have  been  received: 
stagep  :=  (s,2) 

if  more  than  n/2  stage  (s,  1)  messages  received  have  value  v,  for  some  u, 
then  broadcast  (stage p,v, coins p) 
else  broadcast  (stage p,“V\  coinsp) 

/*  finishing  second  part  of  stage  s  */ 

stagep  =  (s,2)  and  at  least  n  —  t  stage  (s,2)  messages  have  been  received: 
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stagep  :=  (s  +  1, 1 ) 

if  a  stage  (s,  2)  message  received  has  value  v,  for  some  v,  then  [ 
votep  :=  v 

if  at  least  n  - 1  stage  (s,  2)  messages  received  have  value  v  then  decidep  :=  v 

] 

else  if  s  <  n  then  votep  :  =  comjp(s]  else  votep  :=  first  bit  of  6 
broadcast  ( stagcp ,  votep,  comsp) 

Transaction  Commit  Protocol:  p’s  transition  function  on  input  M ,  6, 

and  arbitrary  state 

3.2  Proof  of  Correctness 

The  proof  is  organized  as  follows.  Section  3.2.1  shows  the  safety  properties, 
i.e.,  that  the  protocol  is  a  transaction  commit  protocol.  Section  3.2.2  contains 
the  probabilistic  analysis,  which  is  applied  to  show  the  t-non blocking  property  in 
Section  3.2.3. 

3.2.1  Safety  Conditions 

Section  3.2.1  culminates  in  Theorem  8,  which  shows  that  the  protocol  is  a 
transaction  commit  protocol. 

All  the  lemmas  in  Section  3.2.1  hold  for  any  (infinite)  rim  from  an  initial 
configuration.  In  particular,  they  hold  for  runs  in  which  more  than  t  processors 
fail.  Stating  these  results  in  this  way  allows  us  to  show  the  graceful  degradation 
property  of  the  protocol. 

In  run  R,  processor  p  is  said  to  be  in  stage  s ,  for  s  >  1,  if  stagep  =  (s,l)  or 
(a,  2).  We  say  p  completes  stage  s  >  0  if  p  ever  sets  stagep  to  (s  +  1, 1)  in  R.  Let  p’s 
decision  states  Yo  and  Yi  be  states  with  decidep  =  0  and  decidep  —  1  respectively; 
Lemma  7  below  shows  that  once  p  enters  a  state  in  it  stays  in  that  set  forever. 
Note  that  if  no  nonfaulty  processor  ever  receives  a  coins  message,  then  no  processor 
completes  stage  0. 

Lemma  1:  In  any  run  from  an  initial  configuration,  if  some  processor  p  has  votep  — 
0  initially,  then  every  stage  (1,1)  message  has  value  0. 

Proof:  No  processor  ever  receives  a  vote  message  with  value  1  from  p.  Thus  no 
processor  sets  its  vote  to  1  at  the  end  of  its  vote  stage,  and  no  processor  broadcasts 
a  stage  (1,1)  message  with  value  1.  □ 


Lemma  2:  In  any  infinite  run  from  an  initiaJ  configuration,  if  every  processor  p 
has  votep  =  1  initially,  and  the  run  is  failure-free  and  on-time,  then  every  processor 
broadcasts  a  stage  (1,1)  message  with  value  1. 


Proof:  First  we  show  that  each  processor  p  broadcasts  a  vote  message  with  value 
1.  Suppose  either  p  is  the  coordinator,  or  p  receives  a  message  at  its  first  step. 
Then  p  broadcasts  a  coins  message  at  its  first  step.  By  time  K  on  p’s  clock,  each 
processor  receives  p’s  coins  message  and  broadcasts  its  own  coins  message  (if  it  has 
not  already  done  so).  By  time  2K  on  p’s  clock,  p  receives  n  coins  messages.  Thus 
p  broadcasts  a  vote  message  with  value  1. 

Now  suppose  p  is  not  the  coordinator  and  does  not  receive  any  messages  at 
its  first  step.  It  sends  a  request  message  to  the  coordinator,  which  is  received  by 
time  K  on  p’s  clock.  The  coordinator  then  broadcasts  a  coins  message,  if  it  has  not 
already  done  so,  and  p  receives  the  coins  message  by  time  2 K  on  p’s  clock.  Then 
p  broadcasts  a  coins  message;  by  time  3 K  on  p’s  clock,  each  processor  receives  p’s 
coins  message  and  broadcasts  its  own  coins  message  (if  it  has  not  already  done  so). 
By  time  4 K  on  p’s  clock  p  receives  n  coins  messages.  Thus  p  broadcasts  a  vote 
message  with  value  1. 

Now  we  show  that  every  processor  p  receives  n  vote  messages  within  2 K  of  its 
clock  ticks  after  it  broadcasts  its  vote.  Processor  p  broadcasts  its  vote  as  soon  as 
it  receives  its  nth  coins  message.  Suppose  its  clock  reads  T  then.  Since  the  run 
is  on-time,  every  other  processor  receives  its  nth  coins  message,  and  broadcasts  its 
vote,  by  the  time  p’s  clock  reads  T  +  K.  Thus  p  receives  all  n  vote  messages  by 
the  time  its  clock  reads  T  +  2 K.  Then  p  broadcasts  its  stage  (1,1)  message  with 
value  1.  □ 

Lemma  3:  In  any  run  from  an  initial  configuration,  if  every  stage  (s,l)  message 
has  value  v  €  {0, 1},  then  every  processor  that  completes  stage  s  decides  v  at  stage 
s,  for  any  s  >  1. 

Proof:  Let  p  be  any  processor  that  broadcasts  a  stage  (s,2)  message.  Then  p 
receives  at  least  n  —  t  stage  ( s ,  1)  messages,  all  with  value  v  G  {0, 1}  by  assumption. 
Since  n  >  2t,  n  —  t  >  n/2.  Thus  p  broadcasts  a  stage  (s,2)  message  with  value  v. 

Now  let  p  be  any  processor  that  completes  stage  s.  Then  p  receives  at  least 
n  —  t  stage  (s,  2)  messages,  all  with  value  v.  Thus  p  decides  v.  □ 

For  any  s  >  1,  we  call  a  stage  (s,  2)  message  with  value  v  £  {0, 1}  an  S-message 
( “S”  for  “set”),  because  the  receipt  of  such  a  message  can  cause  a  processor  to  set 
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its  vote  to  v  (if  this  message  is  among  the  first  n  —  t  stage  (s,2)  messages  received 
by  the  processor). 

Lemma  4:  In  any  run  from  an  initial  configuration,  there  is  at  most  one  value  sent 
in  S-messages  during  any  stage  s  >  1. 

Proof:  In  order  to  send  an  S-message  for  some  value  v  at  stage  s,  a  processor  must 
receive  more  than  n/2  stage  (s,  1)  messages  with  value  v.  Since  processors  do  not 
broadcast  conflicting  messages,  fewer  than  n/2  processors  can  broadcast  a  stage 
(s,  1)  message  with  value  w  ^  v.  Thus,  no  processor  receives  more  than  n/2  stage 
(s,  1)  messages  with  value  w,  and  no  processor  sends  an  S-message  for  w  at  stage 


Lemma  5:  In  any  run  from  an  initial  configuration,  if  any  processor  decides  v 
before  stage  1,  then 

(1)  v  —  0,  and 

(2)  every  processor  that  completes  stage  1  decides  v  by  the  end  of  stage  1. 

Proof:  Suppose  p  decides  before  stage  1. 

(1)  By  inspecting  the  code,  we  see  that  p  decides  0,  and  sets  its  vote  to  0  before 
broadcasting  its  vote  message. 

(2)  As  in  the  proof  of  Lemma  1,  every  stage  (1,1)  message  has  value  0,  and  by 

Lemma  3,  ever}7  processor  that  completes  stage  1  decides  0.  □ 

Lemma  6:  In  any  run  from  an  initial  configuration,  if  some  processor  decides  v  at 
stage  s  >  1,  then 

(1)  no  processor  decides  w  /  u  at  stage  s,  and 

(2)  every  processor  that  completes  stage  s  1  decides  v  at  stage  s  +  1. 

Proof:  Suppose  processor  p  decides  v  at  stage  s  >  1.  Let  q  be  any  processor  that 
completes  stage  s.  Since  p  decides  v  at  stage  s,  it  receives  at  least  n  —  t  stage  (s,  2) 
messages  with  value  v  before  completing  stage  s.  Thus,  since  n  >  2t  and  q  receives 
at  least  n  —  t  stage  (s,2)  messages  before  completing  stage  s,  at  least  one  of  these 
messages  is  from  a  processor  from  which  p  receives  an  S-message  for  v  in  stage 
s.  Since  processors  do  not  broadcast  conflicting  messages,  q  receives  at  least  one 
S-message  for  v  at  stage  s.  By  Lemma  4,  q  sets  its  vote  to  v ,  and  thus  q  broadcasts 
a  stage  (s  4- 1, 1)  message  with  value  v. 


(1)  If  q  decides  in  stage  s.  then  q  decides  v. 


(2)  By  Lemma  3,  every  processor  that  completes  stage  s  +  1  decides  v  at  stage 
5  +  1.  □ 

Lemma  7:  In  any  run  from  an  initial  configuration,  decide p  changes  value  at  most 
once,  for  every  processor  p. 

Proof:  Pick  any  processor  p.  If  decidep  is  set  before  stage  1,  then  by  Lemma  5, 
every  processor  that  completes  stage  1  decides  v  at  stage  1.  If  decidep  is  set  for  the 
first  time  in  stage  5  >  1,  then  by  Lemma  6,  every  processor  that  completes  stage 
5  +  1  decides  v  by  the  end  of  stage  5  +  1.  Lemma  3  shows  that  for  any  r  >  1,  if 
every  processor  that  completes  stage  r  decides  v  at  stage  r,  then  any  processor  that 
completes  stage  r  +  1  decides  v  at  stage  r  +  1. 

Theorem  8:  Protocol  1  is  a  transaction  commit  protocol. 

Proof:  Let  R  be  a  t-admissible  run.  First  we  show  the  agreement  condition,  that 
there  is  at  most  one  decision  value  in  every  configuration  of  R.  If  some  processor 
decides  before  stage  1,  then  Lemmas  5  and  7  give  the  result.  If  no  processor  decides 
until  stage  s  >  1,  then  Lemmas  6  and  7  give  the  result. 

Next  we  show  the  abort  validity  condition.  Suppose  some  processor  begins 
with  initial  value  0.  If  no  processor  completes  stage  0,  then  Lemma  5  shows  that 
no  processor  decides  1.  If  some  processor  completes  stage  0,  then  all  nonfaulty 
processors  complete  stage  s,  for  all  s  >  0.  Lemmas  1  and  3  (with  u  =  t)  give  the 
result. 

Finally,  we  show  the  commit  validity  condition.  Suppose  R  is  failure-free  and 
on-time,  and  all  processors  begin  with  1.  Then  Lemmas  2  and  3  give  the  result.  □ 

Since  Lemmas  1  through  7  are  true  for  any  (infinite)  run  from  an  initial  con¬ 
figuration,  the  agreement,  abort  validity,  and  commit  validity  conditions  are  true 
even  for  runs  in  which  more  than  t  processors  fail.  This  is  the  graceful  degradation 
property  exhibited  by  our  protocol. 

3.2.2  Probabilistic  Properties 

The  analysis  in  this  subsection  is  directed  toward  showing  that  the  probability 
that  all  processors  that  complete  stage  s,  decide  by  stage  5,  apporaches  1  as  s 
increases.  Recall  that  probabilities  are  taken  over  the  random  information,  holding 
the  adversary  and  initial  configuration  fixed. 
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For  the  following  definitions,  fix  adversary  A,  initial  configuration  /,  and  F 
and  F'  in  T.  Let  R  =  run(A,  I,  F)  and  R'  =  run(A,I,F'). 

Define  F(p,  k)  to  be  the  kth  element  in  the  sequence  for  p  in  F. 

Define  coins(F)  to  be  F( 0, 1)  (i.e.,  the  coordinator’s  first  n-bit  string).  It  is 
easy  to  see  that  if  coinsp  is  ever  nonnil  in  R,  then  it  equals  coins(F),  for  all  p.  We 
denote  the  stk  element  of  coins(F)  by  com.s(F)[.s]. 

For  processor  p  and  s  >  1,  define  index(  R,  p,  s)  to  be  the  number  of  steps  taken 
by  p  to  complete  stage  s  in  R.  If  p  does  not  complete  stage  s,  then  mdex(R,p,s) 
is  undefined.  Thus  index(R,p,  s)  is  also  the  index  into  the  sequence  for  p  in  F  of 
the  bit  string  used  to  determine  the  value  of  votep  in  stage  s,  in  case  s  >  n  and  p 
receives  no  S-message  in  stage  s. 

The  next  definition  maps  a  bit  to  each  processor  and  each  stage  s  >  n  in  a  run, 
such  that  each  stage  gets  “new”  bits.  This  mapping  is  consistent  with  the  mapping 
implemented  in  the  protocol  for  those  cases  where  a  processor  uses  a  random  bit. 
Let  random(R,p,s),  for  processor  p  and  s  >  n,  be  defined  as  follows.  (1)  If  p 
completes  stage  s  in  R ,  then  random(R,  p,  s)  is  the  first  bit  of  F(p,  Jfc),  where  k  = 
index(R,p,s).  (2)  If  p  does  not  complete  stage  s  in  R,  then  random(R,p,s)  is  the 
second  bit  of  F(p,  s  +  1)  (i.e.,  a  safe  default). 

For  0  <  s  <  n,  define  F  and  F'  to  be  (A,  /,  s)-equal if  coins(F)[i\  =  coins(F')[i\ 
for  all  i,  1  <  i  <  s.  For  s  >  n,  define  F  and  F'  to  be  (A,  I,s)-equal  if  F  and 
F'  are  (A,  /,  rc)-equal,  and  for  every  i,  n  +  1  <  i  <  s,  and  every  processor  p, 
random(R,p,s)  =  randomiR! ,p,  s). 

For  s  >  1,  define  v(R,  s)  to  be  the  value  of  an  S-message  sent  in  run  R  at  stage 
s.  If  no  S-message  is  sent  in  R  at  stage  s,  then  let  v(R,  s )  =  0.  By  Lemma  4,  v(R,s) 
is  uniquely  defined. 

Define  match(I?, s)  to  be  the  predicate  that  if  s  <  n,  then  coms(F)[s]  = 
v(R,  s),  and  if  s  >  n,  then  random(R,  p,  s)  =  v(R,s)  for  ail  p. 

Define  same(/?,  s)  to  be  the  predicate  that  all  processors  that  complete  stage 
s  in  R  set  their  votes  to  the  same  value  in  stage  s. 

Define  DECIDE(f?, .«)  to  be  the  predicate  that  each  processor  that  completes 
stage  s  has  decided  by  the  end  of  stage  s  in  R. 


The  next  lemma  characterizes  two  aspects  of  runs  that  are  unchanged,  once  an 
adversary  and  initial  configuration  are  fixed. 

Lemma  9:  Let  A  be  an  adversary,  I  an  initial  configuration,  and  F  and  F'  E  F . 
Let  R  =  run(A,I,F )  =  CieiC?  ...  and  R'  =  run(A,I,F')  =  C[t\C'2 _ 

(1)  For  all  i  >  1,  the  message  pattern  of  C\t j  ...C,  is  the  same  as  the  message 
pattern  of  C[e\  . . .  C[. 

(2)  For  all  processors  p  and  all  s  >  1,  index(R,p,s)  =  index(R' ,p,s). 

Proof:  (1)  The  structure  of  the  protocol  is  such  that  the  random  information 
does  not  affect  which  processors  send  messages  to  which  other  processes  —  it  only 
affects  the  values  of  the  local  variables  and  the  message  contents.  But  this  is  the 
very  information  not  available  to  the  adversaries  under  consideration.  Thus,  for  a 
fixed  adversary  and  initial  configuration,  the  sequence  of  processor  steps  and  the 
message  delays  are  the  same,  regardless  of  the  random  information. 

(2)  Follows  from  (1).  □ 

The  next  lemma  states  that  the  value  of  an  S-message  sent  in  stage  s  +  1  only 
depends  on  the  random  information  available  through  stage  s,  once  an  adversary 
and  initial  configuration  are  fixed. 

Lemma  10:  Let  R  =  run(A,I,F)  and  R'  —  run(A,I,F')  for  adversary  A,  initial 
configuration  I,  and  F  and  F'  in  F.  IfF  and  F'  are  (A,  /,  s)-equal,  then  v(R,  s+1)  = 
v(R',s  +  1),  for  any  s  >  0. 

Proof:  By  Lemma  9,  the  message  patterns  for  R  and  R'  are  the  same.  Since  F  and 
F'  are  (A,  /,  s) -equal,  the  random  information  that  affects  the  local  variables  and 
message  contents  in  R  and  R'  up  through  stage  s  is  the  same  in  F  and  F'.  Thus, 
the  values  of  corresponding  processors’  variables,  and  the  contents  of  corresponding 
messages  sent  up  through  stage  s  are  the  same  in  R  and  R'.  The  random  information 
used  in  stage  s  4- 1  is  not  used  until  the  end  of  stage  s  +  1,  so  the  same  messages  are 
sent  in  stage  s  -f  1  in  R  and  R',  even  though  the  stage  s  +  1  random  information 
might  be  different  in  F  and  F'.  □ 

The  next  lemma  states  some  simple  relationships  between  match,  same,  and 
DECIDE. 

Lemma  11:  Let  R  =  run(A,  I,  F)  for  adversary  A,  initial  configuration  I  and 
F  E  F .  For  all  s  >  1, 

(1)  MATCH (R,s)  implies  same {R,s),  and 
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(2)  SAME(i?,  implies  decide(R,  s  +  1). 

Proof:  Fix  s  >  1. 

(1)  If  5  <  n,  then  MATCH (R,s)  means  that  coin.s(F)[.s]  =  v(R,s).  Thus  coinsp 
has  the  same  value  as  any  S-message  sent  in  stage  s  of  R,  for  all  p.  Thus,  each 
processor  that  completes  stage  s  sets  its  vote  to  v(R,s),  and  SAME (R,s)  is  true. 

If  3  >  n,  then  MATCH(J?,  s)  means  that  the  first  bit  of  F(p,k),  where  k  = 
index(R,p,  s),  is  equal  to  the  value  of  any  S-message  sent  in  stage  s  of  R,  for  all  p. 
Thus,  each  processor  that  completes  stage  s  sets  its  vote  to  v(R ,  s),  and  SAME(i?,s) 
is  true. 

(2)  If  same(/2,  s)  is  true,  then  all  stage  (s  -(-1,1)  messages  have  the  same  value 

v  6  {0, 1}.  Thus  all  stage  (s  +  1,2)  messages  have  value  v.  Thus,  every  processor 
that  completes  stage  s  +  1,  decides  v,  and  DEClDE(il,s  -f  1)  is  true.  □ 

The  following  technical  lemma  concerns  any  equivalence  class  of  F,  where  the 
equivalence  is  defined  by  (A, /,  s)-equality. 

Lemma  12:  Fix  adversary  A,  initial  configuration  /,  an d  s  >  0.  Partition  T  into 
the  maximal  equivalence  classes,  within  each  of  which  all  elements  are  (A,  /,  s)- 
equa 1.  Pick  any  class  C. 

(1)  match (run(A,I,F),i)  =  MATCH(run(A,  I,  F'),  i)  for  all  i,  1  <  i  <  s,  and  any 
F  and  F1  in  C. 

(2)  If  s  <  n,  then  match (run(A,I,F),s  +  1)  is  true  for  half  the  elements  F  of  C; 
if  s  >  n,  then  MATCH(rim(A,  I,  F),  s  -f  1)  is  true  for  a  1/2"  fraction  of  the  elements 
F  ofC. 

Proof:  (1)  Choose  any  i,  1  <  i  <  s,  and  any  F  and  F‘  in  C.  Let  R  =  run(A,  /,  F) 
and  R'  =  run(A,  I,  F').  Since  F  and  F'  are  ( A,I,i  —  l)-equal,  v(R,i )  =  v(R',i ), 
by  Lemma  10.  Since  F  and  F'  are  (A,  /,t)-equal,  c<unj(F)[i]  =  cotnj(F')[i]  if 
i  <  n,  and  random(R,p,i)  =  random(R! ,p,  i )  for  all  p  if  i  >  n;  thus  MATCH (R,i)  = 
MATCH  {R!,i). 

(2)  By  Lerr  a  10,  v(run(A,  I,  F),s  +  1)  is  the  same  for  all  F  €  C. 

Suppose  s  <  n.  In  half  the  elements  F  of  C,  coms(F)[s  +  1]  =  0,  and  in 
half  coms(F)[s  +  1]  =  1,  since  all  the  elements  of  C  are  (A,  I,  s)-equal.  Thus 
MATCH [run(A,  I,  F),s  +  l)is  true  for  half  the  elements  F  of  C. 
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Suppose  s  >  n.  Let  R  =  run(A,  7,  F)  for  F  in  C.  match (R,s  +  1)  means 
random(R,  p,  s  +  1)  =  v(R,s  +  1)  for  all  p.  The  position  of  random(R,  p,  s  +  1)  in 
F  depends  on  whether  p  completes  stage  s  4-  1  in  R  or  not.  By  Lemma  9,  either  p 
completes  stage  s  +  1  in  R  for  all  F  in  C,  or  p  fails  to  complete  stage  s  +  1  in  R 
for  all  F  in  C.  If  p  does  not  complete  stage  s  +  1,  then  randomJ(R,p,s  +  1)  is  the 
second  bit  of  F(p,  s  +  2),  obviously  a  fixed  position  for  all  F  in  C.  If  p  does  complete 
stage  s,  then  random(R,p,s)  is  the  first  bit  of  F(p,k ),  where  k  =  index(R,  p,  s).  By 
Lemma  9,  k  is  the  same  for  all  F  in  C,  so  this  is  also  a  fixed  position  for  all  F  in 
C.  The  positions  of  random(R,  p,  s)  for  all  p  are  all  distinct.  Thus  a  1/2"  fraction 
of  the  elements  F  of  C  have  random(R,p,  s)  =  v(R,  s )  for  all  p.  □ 

The  next  lemma  is  the  key  to  the  termination  of  the  protocol,  as  well  as  the 
good  time  performance.  It  says  that  there  is  a  high  probability  that  the  random 
information  used  to  set  votes  matches  the  value  in  S-messages  for  the  first  n  stages, 
and  there  is  a  smaller,  but  still  positive  probability  for  subsequent  stages. 

Lemma  13:  Fix  adversary  A  and  initial  configuration  I.  Then 

Pr[MATCH(run(A,  I,F),  s)]  =  1/2  if  s  <  n,  and  1/2"  if  s  >  n. 

Proof:  By  part  (2)  of  Lemma  12,  since  the  lemma  is  true  for  every  equivalence 
class  of  F.  □ 

The  next  lemma  shows  that  the  events  of  not  matching  in  different  stages  are 
independent. 

Lemma  14:  Fix  adversary  A  and  initial  configuration  I.  Let  R  =  run(A ,  7,  F)  for 
F  G  T .  Then  for  any  s  >  1, 

Pr[-'MATCH(F,  1)  A  ...  A  ->MATCH(il,  S)}  =  Pr[~ 'MATCH(7Z,  1)]  •  •  •  Pr[->MATCH(i?,  s)]. 

Proof:  Pick  any  i,  1  <  i  <  s.  We  will  show  that 

Pr(-iMATCH(/?,  1)  A  ...  A  -iMATCH(J?,  l)] 

=  Pr[->MATCH(F,  1)  A  ...  A  ->match(H,i  -  1)]-  Pr[~ 'Match(71, i)]. 

Let  X  be  the  set  of  all  F  G  T  such  that  -iMatch(71,  1)  A  ...  A  -'MATCh(H,  i  —  1) 
is  true,  where  R  =  run(A,  I ,  F).  Partition  T  into  equivalence  classes  based  on 


( A ,  I,  i  —  Inequality.  If  F  is  in  .Y,  and  F  and  F'  are  (A,I,i  —  1  )-equal,  then  F'  is 
also  in  X,  by  part  (1)  of  Lemma  12.  Pick  any  equivalence  class  C  that  is  a  subset 
of  X.  Part  (2)  of  Lemma  12  gives  the  result.  □ 

The  next  lemma  shows  that  the  probability  that  all  processors  that  complete 
stage  s,  decide  by  stage  s,  approaches  1  as  s  increases. 

Lemma  15:  For  any  adversary  A  and  initial  configuration  I, 

lim  Pr[DEClDE(run(.4,/,  F),s)l  =  1. 

•S  —'*00 

Proof:  Let  R  =  run(A,  I,  F).  First  note  that 

Pr[DECIDE(i?,  s)]  >  Pr[MATCH(F,  1)  V  ...  V  MATCH {R,S  —  1)]. 

The  reason  is  that  if  MATCH(i?,  s')  is  true  for  some  s',  1  <  s'  <  s  —  1,  then  by 
Lemma  11,  Same(I2,  s')  is  true,  and  thus  DEClDE(/?,s'  +  1)  is  true.  Since  s'  +  1  <  s, 
decide(72,  s)  is  true. 

Pr[MATCH(F,  1)  V  ...  V  match(J?,s  -  1)] 

=  1  -  Pr[_,MATCH(iZ,  1)  A  ...  A  -‘MATCHES  -  1)) 

J— 1 

—  Pr[MATCH(f?,  i )] ) ,  by  Lemma  14 

i=i 

>  1  —  (1  —  l/2n)*~1,  by  Lemma  13. 

Since  limJ_00(l  —  1/2")*-1  =  0  we  are  done.  □ 

3.2.3  Liveness  Condition 

Lemmas  16  and  17  convert  Lemma  15  into  a  statement  about  the  predicate 
DONE,  in  order  to  show  the  t-nonblocking  property  in  Theorem  18. 

Lemma  18:  In  any  run  from  an  initial  configuration,  each  processor  that  completes 
stage  0  without  having  decided  is  in  at  most  asynchronous  round  6. 

Proof:  Suppose  p  completes  stage  0  without  having  decided.  Then  p  obtains  the 
n  random  bits  in  some  message  by  its  2Kth  step,  and  broadcasts  its  coins  message. 
At  most  4 K  steps  later,  p  completes  stage  0.  Since  each  asynchronous  round  lasts 
at  least  K  steps,  at  most  6  rounds  elapse.  □ 
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The  next  lemma  shows  that  each  stage  s  >  1  takes  only  a  bounded  number  of 
asynchronous  rounds. 

Lemma  17:  In  any  run  from  an  initial  configuration,  if  each  processor  that  com¬ 
pletes  stage  s  >  0  is  in  at  most  asynchronous  round  r  when  it  completes  stage  s, 
then  each  processor  that  completes  stage  s  4-  1  is  in  at  most  asynchronous  round 
r  4-2  when  it  completes  stage  s  4-  1. 


Proof:  Let  p  be  any  processor  that  broadcasts  a  stage  (s  4-  1,1)  message.  This 
happens  when  p  completes  stage  s,  so  all  stage  (s  +-  1,1)  messages  are  at  most 
round  r  messages. 

Let  p  be  any  processor  that  broadcasts  a  stage  (s  +  1,2)  message.  Processor 
p  cannot  enter  round  r  4-  1  until  it  has  received  the  last  of  the  round  r  messages, 
including  all  the  stage  (s  +  1, 1)  messages.  Immediately  after  receiving  the  last  of 
these  (if  not  before),  p  broadcasts  its  stage  ( s  4- 1,2)  message,  so  all  stage  (s  4-  1,2) 
messages  axe  at  most  round  r  +  1  messages. 

No  processor  p  can  enter  round  r  4-  2  until  it  has  received  the  last  of  the  round 
r  1  messages,  including  all  the  stage  (s  4-1, 2)  messages.  Yet  by  the  time  p  receives 
all  the  stage  (s  4-  1, 2)  messages,  p  has  completed  stage  3  +  1.  □ 

Theorem  18:  Protocol  1  is  t- nonblocking. 

Proof:  Pick  any  t- admissible  run  R.  Suppose  no  nonfaulty  processor  p  receives  a 
coins  message  in  R.  Then  p  decides  0  by  time  2K  on  its  clock,  i.e.,  by  round  2.  Now 
suppose  some  nonfaulty  processor  receives  a  coins  message  in  R.  Then,  since  R  is 
t-admissible,  every  nonfaulty  processor  receives  a  coins  message  in  R,  and  completes 
stage  s,  for  all  s  >  0.  By  Lemmas  16  and  17,  decide( R,s)  implies  DONE(f?,  6  4-  2s) 
for  any  ^-admissible  run  R.  Lemma  15  gives  the  result.  □ 

3.3  Time  Complexity 

Recall  that  expectation  is  defined  in  Section  2.4  to  be  taken  over  f-admissible 
adversaries  and  initial  configurations.  First,  we  show  that  the  expected  number  of 
stages  is  less  than  4. 

Lemma  19:  Let  X  be  a  random  variable  giving  the  least  s  such  that  all  processors 
that  complete  stage  s  decide  by  stage  s.  Then  EX  <  4. 


Proof:  Fix  t-admissible  adversary  A  and  initial  configuration  I.  Let  R  — 
run(A,  /,  F),  for  F  in  T.  Let  qa  =  Pr[->MATCH(fl,  s)].  Let  Y  be  a  random  variable 
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giving  the  least  number  s  surh  that  all  processors  that  complete  stag*'  .s  have  the 
same  vote  at  the  end  of  stage  s.  By  Lemma  3,  X  <  Y  +  1. 

OO 

EX  <  E(Y  +  1)  =  1  -f  EY  =  1+  ]T  s  •  Pr[V'  =  s] 

5=1 

OO 

<  1  4-  ^2  s  ■  Pr  (a*=i  _,match(7?1  l)J  A  MATCH(i?,  s) 

5=1 

CSC 

=  1  +  J]  $  •  gi <?2  •  •  •  ?3-i(l  -  Q»),  by  Lemma  14 


=  1+  £ 


s  ■  qiq2  ■  ■  ■  qa. 


-HE 


•»  •  9192  ■  ■  q» 


=  1  +  1  +  (y+S  +  1)  •  ?1<?2  •  •  •  qa  j 


—  2  +  ^(s  +  1  —  s)  ■  qxq2  ■  ■  ■  qa 


qiq2  -q» 


*+£ 


=  2  +  (  n  91 "  ■ q* )  +  ( 91  "  qn '  s  fln+l 


•••95 


We  simplify  using  specific  values  for  qa.  For  1  <  s  <  n,  qs  =  1/2,  and  for  s  >  n, 
95  =  1  —  1/2”,  by  Lemma  i3. 


5=1  s=n+l  '  7 

5=1  V  “  7 


=  3  +  1  (—Lr  1121 . 

2n  v i  —  (i  —  i/2n: 

-3+l(2”-l» 


Theorem  20:  All  nonfnulty  processors  decide  in  a  constant  expected  number  of 
asynchronous  rounds. 


25 


Proof:  Let  R  =  run(A,  I,  F)  for  some  ^-admissible  adversary  A,  initial  configura¬ 
tion  /,  and  F  €  T.  If  no  nonfaulty  processor  receives  a  coins  message  in  R ,  then 
every  nonfaulty  processor  decides  by  round  2. 

Suppose  some  nonfaulty  processor  receives  a  coins  message  in  R.  Then,  since 
R  is  ^-admissible,  every  nonfaulty  processor  p  receives  a  coins  message  in  R,  and 
completes  stage  s,  for  all  s  >  0.  By  Lemma  16,  p  is  in  at  most  asynchronous  round 
6  when  it  completes  stage  0.  By  Lemma  17,  when  p  completes  stage  s  of  Protocol  1, 
it  is  in  at  most  asynchronous  round  6  +  2s.  The  expected  number  of  stages  is  4,  by 
Lemma  19.  Therefore  all  nonfaulty  processors  decide  in  14  expected  asynchronous 
rounds.  □ 

4.  Lower  Bound  on  Number  of  Processors 

The  lower  bounds  proved  in  the  next  two  sections  hold  even  if  processors  run 
in  lockstep  synchrony  and  possess  an  atomic  broadcast  capability.  In  this  section, 
we  first  give  relevant  details  of  this  stronger  model,  and  then  show  that  the  number 
of  faults  tolerated  by  our  transaction  commit  protocol  is  optimal. 

A  processor  failure  is  represented  by  an  explicit  failure  step,  denoted  (p, 

After  a  failure  step  for  p,  p  is  in  a  distinguished  failed  state.  Thus  failures  can 
be  evidenced  in  finite  runs.  (Of  course,  processors  cannot  detect  failures  because 
message  delivery  is  asynchronous.)  A  processor  is  faulty  in  a  run  if  it  takes  a  failure 
step,  otherwise  it  is  nonfaulty. 

Processors  take  steps  in  round-robin  order,  0  through  n  —  1;  a  schedule  of  the 
form  (0,  Mi ,  fi ) . . .  (n  -  1,  Mn ,  /„ )  is  a  cycle.  To  enforce  the  round-robin  behavior, 
each  configuration  has  a  turn  component,  designating  which  processor’s  turn  it  is  to 
take  a  step.  An  initial  configuration  has  turn  =  0.  In  order  for  an  event  e  =  (p,  *,  6) 
to  be  applicable  to  a  configuration  C,  turn(C)  must  equal  p,  and  if  p  is  in  the  failed 
state  in  C,  then  e  must  be  a  failure  step.  After  an  event  is  applied,  the  resulting 
configuration’s  turn  component  is  incremented  by  1  (modulo  n). 

The  guarantee  definition  is  no  longer  needed,  since  atomic  broadcast  is  allowed. 

The  delay  of  message  m  that  is  received  in  run  R  is  the  number  of  the  cycle  to  which 
the  receiving  event  belongs  minus  the  sending  time  of  m.  An  infinite  run  R  is  t- 
admissible,  for  0  <  t  <  n,  if 

•  the  first  configuration  is  an  initial  configuration, 

•  at  most  t  processors  are  faulty, 

•  all  messages  sent  to  a  nonfaulty  processor  are  received,  and 
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•  all  received  messages  have  delay  at  least  1. 

In  this  model,  the  adversary  cannot  schedule  when  processors  take  steps,  but 
can  only  determine  when  a  processor  fails  and  what  the  message  delays  are. 

In  this  section  we  show  that  no  protocol,  even  a  randomized  one,  can  solve  the 
transaction  commit  problem  unless  more  than  half  the  processors  are  nonfaulty.  The 
intuition  behind  the  proof  is  similar  to  that  for  the  coordinated  attack  problem  (first 
posed  in  [G] ;  also  analyzed  in  [HM]).  We  partition  the  processors  into  two  groups, 
each  of  size  at  most  t.  Given  a  run  that  decides  1  (in  which  all  processors  begin 
with  1),  we  work  backwards  from  the  end  of  the  run  to  the  beginning,  delaying 
messages  between  the  two  groups  and  showing  that  the  resulting  runs  must  still 
decide  1.  Eventually  we  get  a  run  in  which  no  messages  between  the  groups  are 
received,  yet  the  processors  decide  1.  This  situation  leads  to  a  contradiction,  since 
one  group  could  have  started  with  0’s,  in  which  case  the  decision  should  be  0. 

The  actual  construction  of  the  runs  is  fairly  involved,  and  is  facilitated  by  the 
following  definitions  and  lemmas. 

Let  state(p,C)  be  the  state  of  processor  p  in  configuration  C,  and  buff(p,C) 
be  the  state  of  p’s  buffer  in  C.  Given  a  schedule  a  and  a  subset  5  of  the  processors, 
define  cr|S  to  be  the  subsequence  of  a  consisting  of  exactly  those  events  that  are 
steps  for  processors  in  5.  Also  define  kill(S,cr )  to  be  the  schedule  obtained  from  a 
by  replacing  every  event  (p,  *,  b )  (where  *  can  be  M  or  _L)  with  (p,  J_,  b)  whenever  p 
is  in  S;  similarly,  define  deafen(S,  er)  to  be  the  schedule  obtained  from  a  by  replacing 
every  event  (p,  *,  b)  (where  *  can  be  M  or  J.)  with  (p,  0,  b)  whenever  p  is  in  S. 

Lemma  21:  Let  a  be  a.  schedule  applicable  to  configuration  C  and  r  be  a  schedule 
applicable  to  configuration  D.  Let  S  be  a  set  of  processors.  If  state(p,C)  = 
state(p,D)  for  all  processors  p  in  S  and  if  <7|S  =  r|S,  then  for  any  processor  p 
in  S,  state(p,o{C ))  =  state(p,r(D)). 

Proof:  Use  induction  on  the  length  of  cr|S,  and  the  fact  that  the  transition  functions 
are  deterministic,  given  states,  messages  and  random  numbers.  □ 

Given  a  partition  of  the  set  of  processors  P  into  two  sets  S  and  S' ,  define  an 
intergroup  message  (relative  to  5  and  S')  to  be  a  message  sent  from  a  processor  in 
5  to  a  processor  in  S'  or  vice  versa. 

Lemma  22:  Let  S  and  S'  be  a  partition  of  the  set  of  processors,  and  let  C  and  D  be 
two  configurations  such  that  state(p,C)  =  state(p,D)  and  bu£f(p,C)  C  buff(p,D) 
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for  all  p  in  S.  Let  a  be  a  schedule  applicable  to  C  in  which  any  intergroup  message 
that  is  received  by  p  6  S  in  o  is  in  buff(p,  C).  Then 

(a)  the  schedule  4>  =  kill(S',  o)  is  applicable  to  D; 

(b)  if  no  processor  in  S'  is  in  a  failed  state  in  D ,  then  the  schedule  r  =  deafen(  S' ,o) 
is  applicable  to  D. 

Proof:  We  show  (b);  (a)  is  similar.  We  proceed  by  induction  on  the  length  l  of  o . 

Basis:  1  =  1.  Let  o  =  e  and  r  =  e'.  If  e  is  an  event  for  p  in  S',  then  p  receives 
no  messages  in  e' .  This  event  is  clearly  applicable  to  D  since  p  has  not  failed  in  D. 
If  e  is  an  event  for  p  in  S,  then  since  r  =  cr  and  buff(p,C)  C  buff(p,D),  the  fact 
that  a  is  applicable  to  C  implies  that  r  is  applicable  to  D. 

Induction:  l  >  1.  Suppose  the  lemma  is  true  for  schedules  of  length  l  —  1  and 
show  for  length  l.  Let  o  =  o'e  be  a  schedule  of  length  1.  Since  o'  has  length  /  —  1, 
by  the  induction  hypothesis  r'  =  deafen(S'  ,o')  is  applicable  to  D.  We  must  show 
that  e'  =  deafen(S',  e)  is  applicable  to  r'(D)  =  E.  If  e  is  an  event  for  p  in  S',  then 
p  receives  no  messages.  This  event  is  clearly  applicable  to  E  since  p  has  not  failed 
in  D  and  no  subsequent  steps  are  failure  steps. 

Suppose  e  =  (p,  M,  b)  for  p  in  5.  We  must  show  that  each  m  in  M  is  in 
buff{p,E).  Choose  m  in  M  and  let  q  be  the  sender. 

If  m  is  in  buff(p,  C),  then  m  is  in  p’s  buffer  in  every  configuration  from  C  to 
o'(C).  Since  buff[p,  C)  C  buff (p,  Z?),and  no  message  is  removed  from  a  buffer  by  r' 
that  is  not  removed  by  o',  m  is  still  in  buff[p,E). 

Suppose  m  is  not  in  buff(p,C).  Then  by  assumption  on  o ,  q  is  in  S.  Let  o"g 
be  the  prefix  of  o'  such  that  ( o"g)(C )  is  when  m  first  appears  in  p’s  buffer.  Thus, 
q  sends  m  as  a  result  of  event  g  in  run(C,o').  Since  q  is  in  5,  r"g  is  a  prefix  of 
r\  where  t"  =  deafen(S' ,o").  By  the  induction  hypothesis,  r"  is  applicable  to  D, 
so  by  Lemma  21,  state{q,o"(C))  =  state(q,T"(D)).  By  the  inductive  hypothesis, 
since  the  length  of  o" g  is  less  than  l,  g  is  applicable  to  t"(D).  Thus  m  is  also  sent 
in  run(D ,  r'),  and  m  is  in  p’s  buffer  in  E.  □ 

The  next  theorem  shows  that  for  any  protocol,  there  is  some  finite  run  that 
computes  the  wrong  decision  value,  if  no  more  than  half  the  processors  are  nonfaulty. 

Theorem  23:  There  is  no  t-nonblocking  transaction  commit  protocol  if  n  <  2 1. 


Proof:  Suppose  n  <  2t  and  that  there  is  a  t- nonblocking  transaction  commit 
protocol  with  processors  0  through  n  —  1. 

Let  A  =  {0, . . . ,  t  —  1 }  and  5  =  n  —  1}.  Each  of  .4  and  B  has  at  most 

t  elements.  The  first  t  events  of  a  cycle  form  an  A-stmicycle  (each  processor  in  A 
takes  a  step);  the  remaining  events  of  a  cycle  form  a  B-semicycle  (each  processor  in 
B  takes  a  step).  An  infinite  schedule  applicable  to  an  initial  configuration  consists 
of  alternating  A-  and  5-semicycles. 

Let  in  be  the  initial  configuration  in  which  all  processors  have  initial  value 
1.  Since  the  protocol  is  a  t-nonblocking  transaction  commit  protocol,  given  an 
adversary  that  kills  no  processors  and  delivers  in  cycle  j  +  1  any  message  sent  in 
cycle  j  (so  every  run  is  failure-free  and  on-time),  there  is  at  least  one  finite  deciding 
run  run(a,  In)  such  that  all  processors  have  decided  1  in  o(/u).  Let  a  =  tti  ...  ny 
where  each  7r,  is  a  semicycle. 

Claim:  There  exist  y  + 1  finite  failure-free  schedules  c*i  through  c*y+1  such  that 
for  each  i,  (1)  «*<  =  rrj  . . .  Wi-in,  (2)  Oj  is  applicable  to  In,  (3)  all  processors  have 
decided  1  in  a i(/n),  and  (4)  no  intergroup  message  is  received  in  7 ,. 

Proof  of  Claim:  We  show  the  claim  by  descending  induction  on  i.  Let  C,  — 
(tti  .  ..it  j)(/n)  for  i  >  1,  and  C0  =  In- 

Basis:  i  =  y  +  1.  Letting  ay+ 1  =  a  (so  that  7y+i  is  empty)  proves  the  claim. 

Induction:  i  <  y  +  1.  We  assume  the  claim  is  true  for  i  +  1  and  show  it  for  i. 

Assume  it,  is  a  5-semicycle,  i.e.,  t  is  even.  (We  will  indicate  in  parentheses  the 
changes  necessary  when  n,  is  an  A-semi  cycle,  i.e.,  when  *  is  odd.)  If  no  processor 
in  5  receives  any  message  from  a  processor  in  A  in  w,,  then  letting  7,  =  7r,7,+  1 
satisfies  properties  (1)  through  (4). 

Suppose  some  processor  in  5  receives  a  message  from  some  processor  in  A  in 
7r,.  We  construct  7*  in  two  steps;  first  we  construct  0X,  after  which  ail  processors 
in  A  have  decided,  and  then  we  construct  02 ,  in  which  all  processors  in  5  decide. 
Then  7,  will  be  0\02- 

Define  0\  to  be  deafen(B,  7Ti7,+1).  (See  Figure  1.)  By  Lemma  22,  0\  is  appli¬ 
cable  to  Ci-\.  Since  0\\A  =■  7r,7i+i | A,  Lemma  21  applies  and  each  processor  in  A 
has  the  same  state  in  0i(Ci.x)  =  F  as  it  does  in  (?r,7.+i )(£.-!),  so  each  decides  1 
in  F.  No  intergroup  message  is  received  in  0X  because  processors  in  5  receive  no 


messages  in  0i,  and  processors  in  A  receive  no  intergroup  messages  in  Tr,7,+1  or  in 

0i. 


Ill  —  Co*  *  *  C,_  2 


JT.-l 


A-s.c. 


B-s.c. 


■C, 


•  Cy  =  Q(/u) 
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0i  —deafen(B,  jt,7,+i) 
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Figure  1:  Construction  of  0\ 


Now  we  must  give  a  schedule  02  that  causes  processors  in  B  to  decide  1  without 
hearing  from  any  processors  in  A.  The  intuition  is  that  processors  in  B  must  be 
able  to  decide  without  hearing  from  processors  in  A ,  because  it  is  possible  that  all 
the  processors  in  A  have  died.  By  the  agreement  condition,  the  processors  in  B 
must  decide  1  also.  The  problem  with  applying  this  argument  is  that  there  may  be 
leftover  messages  sent  by  processors  in  A  before  the  point  at  which  the  processors 
in  B  think  they  died,  and  thus  processors  in  B  could  wait  to  receive  these  messages 
before  deciding.  Thus,  we  must  show  that  processors  in  A  might  have  died  even 
earlier. 

Semicycle  7r *  is  part  of  cycle  number  fi/2]  =  j  in  at,-.  (See  Figure  2.)  Let  D 
be  the  configuration  in  run(ati,  In)  immediately  preceding  the  (j  —  1)*‘  cycle  of  o,. 
(If  j  =  1,  then  let  D  =  In.)  Let  r  be  the  substring  of  a*  between  In  and  D.  Let 
p  be  the  substring  of  a*  between  D  and  C,_i.  There  are  two  possibilities  for  p. 


•  If  t  =  2,  then  D  =  In  and  p  =  *7.  Thus,  p  is  am  A-semicycle. 

•  If  i  >  2,  then  D  =  C,_<  and  p  =  7rj_37rI_27ri_i .  Thus,  p  consists  of  all  of  cycle 


j  —  1  and  the  first  half  of  cycle  j  (an  .4-semicycle  followed  by  a  B-semicycle 
followed  by  another  .4- semicycle).  (Pictured  in  Figure  2.) 


(If  7T;  is  an  A-semicycle,  i.e.,  if  i  is  odd,  then  there  are  the  following  two 
possibilities  for  p. 

•  If  t  =  1,  then  D  =  In  and  P  is  empty. 

•  If  i  >  1,  then  D  =  C*_ 3  and  p  =  7rj_27ri_i.  Thus,  p  consists  of  cycle  j  —  1  (an 
A-semicycle,  followed  by  a  B-semicycle).) 


cycle  j  —  1 
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Figure  2:  Construction  of  /32 


Let  p'  =  kill(A ,  p).  Since  no  message  is  sent  and  received  in  the  same  cycle 
in  a  (and  hence  in  p),  any  message  received  in  p  by  a  processor  p  in  B  from  a 
processor  in  A  is  sent  in  run(r,  In),  i.e.,  prior  to  cycle  j  —  1,  and  is  in  buff(p,D). 
By  Lemma  22,  p'  is  applicable  to  D.  Since  p|B  =  p'\B,  Lemma  21  implies  that 
state(p,  p'(D))  =  3tate(p,Ci-i )  for  all  p  in  B. 

Consider  the  schedule  0[  =  kill(A,  0i ).  (See  Figure  2.)  Since  the  processors  in 
A  are  failed  and  the  processors  in  B  receive  no  messages,  (3[  is  obviously  applicable  to 
p'{D).  Let  E  =  0\{p'{D)).  Since  0[\  B  =  0i\B  and  state(p,  p'{D))  =  state(p,  Cj_i) 
for  all  p  in  B,  Lemma  21  implies  that  state(p,  E )  =  state(p ,  F)  for  all  p  in  B. 


By  the  t-nonblocking  property,  since  jA|  <  t,  there  must  exist  a  finite  deciding 
run  from  E  with  schedule  6.  Suppose  the  decision  value  is  v.  Thus,  all  processors 
in  B  decide  v  in  6(E).  By  choice  of  a,  all  messages  sent  in  run(r,  Iu),  i.e.,  before 
cycle  j  —  1,  are  received  by  the  end  of  cycle  j  —  1,  i.e.,  by  the  end  of  p  or  earlier. 
Since  p'\B  =  p\B ,  every  processor  in  B  receives  in  p'  all  messages  sent  to  it  in 
run(r,  In),  i.e.,  before  cycle  j  —  1.  Thus  in  6,  processors  in  B  receive  only  messages 
sent  in  run(p' 0(6,  p' (D)).  Since  all  processors  in  A  are  dead  in  p' 0(6,  B  receives  no 
intergroup  messages  in  6. 

Let  02  =  deafen(A,6).  Pick  p  in  B.  From  above,  state(p,E)  =  state(p,F). 
Let  m  be  any  message  in  buff(p,E );  m  could  only  have  been  sent  by  a  processor  q 
in  B  in  run(p'  0(,D),  i.e.,  in  cycle  j  —  1  or  later.  Lemma  21  implies  that  q  has  the 
same  state  in  corresponding  configurations  in  run(p'0(,D)  and  run(p0y , D).  Thus 
q  sends  the  same  messages  in  the  two  runs,  and  m  is  also  in  buff(p,  F).  Now  we 
can  apply  Lemma  22  to  show  that  02  is  applicable  to  F. 

Since  02\B  =  6\B  and  state(p,F)  =  state(p,E )  for  all  p  in  B,  Lemma  21 
implies  that  each  processor  p  in  B  is  in  the  same  state  in  02(F)  as  in  6(E).  So 
each  processor  in  B  decides  v  in  02(F);  by  the  agreement  condition,  v  =  1,  because 
processors  in  A  have  already  decided  1  in  F.  No  intergroup  message  is  received  in 
02  because  none  is  received  in  6. 

Let  7,  =  0i02-  We  have  shown  that  a,  =  717  . . .  7r,_17,  satisfies  properties  (1), 
(2),  (3)  and  (4).  End  of  Claim. 

Note  that  a  j  is  a  finite  schedule  in  which  no  intergroup  messages  are  received. 
Construct  schedule  a  =  kill(A,a  1).  By  Lemma  22,  a  is  applicable  to  In.  Since 
a\B  =  07  |B,  Lemma  21  implies  that  each  processor  in  B  has  the  same  state  in 
cr(In)  83  it  does  in  07 (In),  and  thus  also  decides  1  in  <r(In). 

Let  I01  be  the  initial  configuration  in  which  all  processors  in  A  have  initial 
value  0  and  all  processors  in  B  have  initial  value  1.  By  Lemma  22,  a  is  applicable 
to  lot-  Since  each  processor  in  B  begins  with  the  same  state  in  I01  as  in  In,  by 
Lemma  21  each  has  the  same  state  in  cr(I0 1)  as  it  does  in  <7(In),  thus  also 
decides  1  in  <r(Ioi).  But  this  violates  the  abort  validity  condition.  □ 

5.  Lower  Bound  on  Time 

One  might  imagine  a  transaction  commit  protocol  for  our  model  such  that  each 
processor  could  decide  in  a  constant  number  of  its  own  steps,  at  least  in  many  runs. 


For  instance,  in  the  protocol  presented  in  Section  3,  at  most  6K  steps  are  required 
for  a  processor  to  complete  stage  0  —  a  processor  need  not  wait  arbitrarily  long  for 
messages  since  the  existence  of  a  late  message  means  that  the  processor  is  allowed 
to  abort.  Yet  in  the  subsequent  stages,  no  advantage  is  taken  of  this  flexibility, 
and  processors  wait  potentially  unbounded  time  for  messages.  Unfortunately,  the 
intuition  that  it  may  be  possible  to  use  the  detection  of  late  messages  in  order  to 
shorten  the  running  time  (as  measured  in  processor  steps)  is  incorrect.  In  fact,  in 
this  section  we  prove  that  no  protocol  can  guarantee  that  each  processor  terminate 
in  a  constant  expected  number  of  its  own  steps,  even  if  processors  run  in  lockstep 
synchrony,  and  even  if  only  one  processor  can  fail. 

In  particular,  we  show  that  for  any  constant  B,  there  is  a  1-admissible  adversary 
and  an  initial  configuration  such  that  the  expected  number  of  cycles  needed  for  all 
nonfaulty  processors  to  decide  is  more  than  B.  The  proof  is  constructed  as  follows. 
We  consider  the  initial  configuration  in  which  all  processors  begin  with  1,  and  the 
adversary  that  kills  no  processors  and  delivers  all  messages  with  delay  1.  If  no  run 
from  this  initial  configuration  with  this  adversary  is  deciding  by  cycle  B,  we  are 
done.  Suppose  there  is  such  a  B-cycle  run  that  is  deciding.  We  find  a  point  in  this 
run  that  has  the  property  there  are  some  very  long  runs  extending  from  this  point 
that  are  not  deciding.  These  runs  sure  kept  undeciding  by  delaying  the  delivery  of 
all  messages.  These  runs  are  so  long  that  they  cause  the  expected  value  to  exceed 
B ,  when  calculated  with  the  appropriate  initial  configuration  and  adversary. 

Thus,  we  must  solve  two  subproblems.  First,  we  must  find  the  appropriate 
point  in  the  run  from  which  the  long  runs  branch  off  (cf.  Lemma  24);  second,  we 
must  show  that  the  long  runs  extending  from  this  point  are  undeciding  (cf.  Lemma 
25). 

We  need  the  following  definitions  in  addition  to  the  definitions  and  Lemmas  21 
and  22  from  Section  4. 

If  p  is  a  processor,  then  schedule  <7  is  p-free  if  p  only  takes  failure  steps  in  a. 

A  run  is  x -slow  for  some  constant  r  if  every  message  received  in  the  run  has 
delay  at  least  x.  Given  a  configuration  C,  a  schedule  a  is  x-slow  relative  to  C  if  the 
rim  obtained  by  applying  a  to  C  is  x-slow. 

A  seed  (for  protocol  P)  is  an  n-tuple  of  sequences  of  n-bit  strings,  such  that 
either  each  sequence  is  infinite  or  each  sequence  has  the  same  number  of  elements. 
The  length  of  a  seed  is  the  length  of  one  sequence.  If  seed  F  has  infinite  length, 
then  F  is  in  F.  There  is  a  finite  number  of  seeds  of  any  finite  length. 


A  run  is  F -compatible,  for  seed  F,  if  for  all  processors  p  and  all  i  not  exceeding 
the  length  of  F ,  the  random  string  that  p  receives  in  its  ith  step  is  the  same  as 
the  ith  element  of  p’s  sequence  in  F.  Given  configuration  C ,  a  schedule  a  is  F- 
compatible  relative  to  C  if  C  is  reachable  by  an  F-compatible  run  and  run(C,o)  is 
F-compatible. 

For  the  remainder  of  this  section,  we  fix  an  arbitrary  1-nonblocking  transaction 
commit  protocol  P.  From  now  on,  “run”  means  a  1-admissible  run  of  P,  and 
“configuration”  means  a  configuration  reachable  from  some  initial  configuration  of 
P  by  a  1-admissible  rim  of  P. 

Let  V  be  a  subset  of  {0,1},  x  an  integer,  and  F  a  seed.  Configuration  C 
is  {x,  F,  V} -valent  if  V  is  the  set  of  decision  values  of  all  configurations  that  are 
reachable  from  C  by  an  x-slow  F-compatible  run. 

For  the  rest  of  this  section,  let  I\  be  the  initial  configuration  in  which  all 
processors  have  initial  value  1. 

The  next  lemma  shows  that  in  an  F-compatible  run  that  decides  1,  there 
exists  a  configuration  from  which  some  F-compatible,  x-slow  run  decides  1,  and 
from  which  some  other  F-compatible,  x-slow  run  decides  0. 

Lemma  24:  If  run(/x,r)  is  a  finite  failure-free  on-time  deciding  run  that  is  F- 
compatible  for  finite  seed  F,  then  for  any  integer  x  >  0  there  exists  a  configuration 
in  run(Ii,r )  that  is  (x,F,  {0,  l})-va/ent. 

Proof:  Pick  such  a  run  run(/1,r)  that  is  F-compatible,  and  fix  x.  By  the  com¬ 
mit  validity  condition,  t(Ix  )  =  C  has  decision  value  1.  Thus  all  runs  starting 
at  C,  including  x-slow  F-compatible  runs,  have  decision  value  1,  and  hence  C  is 
(x,  F,  {l})-valent. 

Let  70i  be  the  initial  configuration  in  which  some  processor  q  has  initial  value 
0  and  the  rest  have  initial  value  1.  Since  the  protocol  is  1-nonblocking  and  since  F 
is  finite,  there  is  a  finite  q-free  x-slow  F-compatible  run  run(<r,  Ioi)  such  that  cr(J01) 
has  decision  value  0,  and  by  the  agreement  condition,  cr(/0 1)  is  (x,  F,  {0})- valent. 

By  Lemma  22,  a  is  also  applicable  to  Ix.  By  Lemma  21,  all  processors  except 
q  have  the  same  state  in  o{I\)  as  in  <7(/<n),  and  decide  0  in  (t(I\).  Thus  I\  is  either 
(x,  F,  {0})-valent  or  (x,  F,  {0,  l})-valent.  If  the  latter  is  true,  we  are  done.  Suppose 
the  former  is  true. 


Since  F  is  finite,  by  the  1-nonblocking  property  no  configuration  in  run(/j,r) 
is  (x,  F,  0)- valent.  The  valencies  of  I\  and  C  imply  that  there  must  be  an  event 
e  =  (p, M, b)  and  two  adjacent  configurations  in  run(ii,r),  Co  and  C\  with  C\  = 
e(Co),  such  that  Co  is  either  (x,  F,  {0})-vaient  or  (x,  F,  {0,  l})-valent,  and  C\  is 
either  (x,F,  {l})-valent  or  (x,  F,  {0,  l})-valent.  (See  Figure  3.) 
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Figure  3:  Demonstrating  the  existence  of  an  (x,  F,  {0, 1  })-valent  configuration 

If  either  configuration  is  (x,  F,  {0, 1})- valent,  we  axe  done.  Say  neither  is.  Since 
the  protocol  is  1 -nonblocking,  F  is  finite,  no  processor  has  failed  so  far,  and  Co  is 
(x,  F,  {0})-valent,  there  is  a  finite  p-free  x-slow  F-compatible  run  run(a,  Co)  in 
which  the  nonfaulty  processors  decide  0.  Say  a  =  ( p ,  _L,  &')<*'.  (If  F  is  long  enough 
to  extend  past  Co,  then  b'  =  6;  otherwise,  b'  could  differ  from  6.)  Since  a'  is 
applicable  to  C\ ,  Lemma  21  implies  that  all  the  processors  except  p  have  the  same 
state  in  a'(C i)  as  they  do  in  a(Co).  But  since  they  decide  0  in  a(Co),  and  since  a' 
is  F-compatible  and  x-slow  relative  to  C\ ,  this  is  a  contradiction  to  the  hypothesis 
that  Ci  is  (x, F,  {l})-valent.  D 

The  next  lemma  shows  that  in  a  certain  situation,  processors  must  remain 
undecided  as  long  as  no  messages  are  received. 

Lemma  25:  Let  A  be  the  adversary  that  kills  no  processors,  and  that  for  the  first  l 
events  delivers  messages  after  delay  1  and  subsequently  delivers  messages  after  delay 


x,  for  some  x  >  l.  Let  F  be  a  seed  of  length  x.  If  the  configuration  C  following 
the  Ith  event  in  run(A,I\,F )  is  (x,F,  {0,  l})-vaient,  then  the  final  configuration  in 
run(A,Ii,F)  is  ( x,F ,  {0,  l})-valent. 

Proof:  Let  run(A, l), F)  =  run(aa,I\),  where  C  =  o(/i).  Assume  in  contradic¬ 
tion  that  cr(C)  is  not  (x,  F,  {0,  l})-valent.  Since  F  is  finite,  by  the  1-nonblocking 
property,  cr(C)  cannot  be  (x,  F,  0)- valent.  Assume  <x(C)  is  (x,  F,  {t>} )- valent.  Then 
there  is  a  configuration  D  in  run(<r,  C )  and  some  event  e  =  (p,  M,  6)  in  a  such  that 
D  is  (x,  F,  {0,  l})-valent  and  e(D )  is  (x,  F,  {u;})-valent.  M  must  be  the  empty  set, 
since  no  messages  are  received  in  run(<x,C).  Suppose  w  =  0.  (The  argument  is 
analogous  if  w  =  1.)  The  only  other  event  applicable  to  D  that  can  be  part  of 
an  x-slow  F-compatible  run  is  (p,  ±,6)  =  e',  because  all  messages  sent  more  than 
x  cycles  ago  have  delay  1  and  have  already  been  received,  and  because  F  is  long 
enough  to  extend  to  e.  (See  Figure  4.) 


□ 


Figure  4:  Demonstrating  that  cr(C)  is  (x,  F,  {0,  l})-valent 


Since  D  is  (x,  F,  (0,  l})-valent,  e'(D)  must  be  either  (x,  F,  {0, 1})- valent  or 
(x,  F,  {1})- valent.  Thus  there  is  some  finite  p- free  x-slow  F-compatible  run  from 
e'(D)  that  has  decision  value  1;  let  r  be  its  schedule.  Now  r  is  also  applicable,  x- 
slow  and  F-compatible  relative  to  e(D),  and  all  processors  except  p  have  the  same 
state  in  r(e(D))  as  in  r(e'(D))  (by  Lemma  21),  so  they  decide  1,  contradicting  the 
valency  of  e(D).  □ 
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Given  infinite  run  R,  let  T(i?)  be  the  cycle  when  the  last  nonfaulty  processor 
decides. 

Theorem  26:  For  any  constant  B,  there  is  a  1-admissible  adversary  A  and  an 
initial  configuration  I  such  that  E(Ta,i )  >  B. 

Proof:  Fix  B.  Let  R  be  the  set  of  all  runs  of  the  form  run(A\,I\,F),  where  F  is 
a  seed  of  length  B ,  and  A\  is  the  adversary  that  kills  no  processors  and  delivers  all 
messages  with  delay  1.  Let  |7£|  =  j.  Thus,  j  is  also  the  number  of  seeds  of  length 
B. 

Case  1:  No  run  in  R  is  deciding.  Let  A  =  A\  and  I  =  I\.  Then  E(Taj)  >  B. 

Case  2:  There  is  some  run  R  in  R  that  is  deciding.  Let  C  be  the  set  of  all 
configurations  in  run  R,  and  let  m  =  \C\.  Let  S  be  the  collection  of  all  seeds  with 
length  jmB  that  extend  the  seed  of  R.  S  is  finite;  in  fact,  |5|  =  z/j,  where  z  is  the 
total  number  of  seeds  of  length  jmB. 

We  will  associate  each  seed  in  S  with  a  configuration  in  C  in  such  a  way  that 
all  runs  from  a  configuration  in  C,  using  a  particular  adversary  and  any  of  the 
associated  seeds,  is  undeciding.  The  extreme  length  of  these  undeciding  runs  will 
cause  the  desired  expected  value  to  exceed  B. 

For  each  C  G  C,  define  S(C)  to  be  the  set  of  all  F  £  S  such  that  C  is 
the  first  (jmB,  F,  {0,  l})-valent  configuration  in  R.  By  Lemma  24,  at  least  one 
(jmB,  F,  {0, 1})- valent  configuration  exists  in  f?;  thus,  each  F  €  S  is  in  5(C)  for 
exactly  one  configuration  C. 

Fix  C  to  be  a  configuration  in  C  with  |S(C)|  >  ^  •  |5|.  Such  a  configuration 
exists  by  the  pigeonhole  principle,  since  |C|  =  m.  Thus,  |S(C)|  >  y —  •  z. 

Let  l  be  the  number  of  events  that  precede  C  in  run  R.  Let  A  be  the  adversary 
that  for  the  first  l  events  delivers  messages  after  delay  1  and  that  subsequently 
delivers  messages  after  delay  jmB.  By  Lemma  25,  for  every  F  in  5(C),  the  final 
configuration  of  run(A,Ii,  F)  is  (jmB,  F,  {0,  l})-valent.  Thus,  no  processor  has 
decided  in  that  final  configuration,  and  T(R')  >  jmB,  for  any  infinite  run  R!  that 
is  an  extension  of  run(A,  I\ ,  F). 

Let  I  —  I\.  By  choice  of  C,  at  least  a  yL  fraction  of  all  the  seeds  of  length 
jmB  are  in  5(C).  Thus,  at  least  a  fraction  of  sill  infinite  seeds  have  a  prefix  in 
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S(C).  For  any  infinite  seed  F  with  a  prefix  in  S(C),  T(run(A,I,F ))  >  jmB,  by 
the  argument  above.  As  a  result, 

E(TA  i)  >  -3-  •  jmB  =  B.  □ 

jm 
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